Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

[BesTLA] The initial SYCL support #229

Merged
merged 25 commits into from
Apr 25, 2024
Merged

Conversation

ThanatosShinji
Copy link
Contributor

Type of Change

  1. Compile BesTLA with ICX
  2. Add SGEMM and HGEMM
  3. Add symmetric S4 weight support for blocksize%32==0

Support Matrix

Row-major only

A B B Layout
Fp16 Fp16 NT
Fp32 Fp32 NT
Fp32 S4 Sym NT/T
Fp16 S4 Sym NT/T

NT: Non Transpose
T: Transpose

Performance

Type Problem A750(TFLOPs) MTL 8 cores(TFLOPs)
HGEMM 4096x4096x4096 18.4 4.8
SGEMM 4096x4096x4096 10.4 3.0
HGEMM+S4 Group=128 1x4096x4096 1.05 0.324
HGEMM+S4 Group=128 1x12288x4096 1.45 0.341

~88GB/s bandwidth on MTL

~373GB/s bandwidth on A750

Limitations

  1. No tail process for N and K
  2. Tail process for M is not optimized
  3. Epilogue is not tested with other activation functions
  4. Transpose B is not optimized

@airMeng airMeng merged commit dfdfb0f into intel:main Apr 25, 2024
12 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants